The Algebraic Complexity of Maximum Likelihood 
Estimation for Bivariate Missing Data 

Serkan Ho§ten 
Department of Mathematics, San Francisco State University 
Seth SuUivant 
Department of Mathematics, Harvard University 

Abstract 

We study the problem of maximum hkehhood estimation for general 
patterns of bivariate missing data for normal and multinomial random 
variables, under the assumption that the data is missing at random 
(MAR). For normal data, the score equations have nine complex so- 
lutions, at least one of which is real and statistically significant. Our 
computations suggest that the number of real solutions is related to 
whether or not the MAR assumption is satisfied. In the multinomial 
case, all solutions to the score equations are real and the number of real 
solutions grows exponentially in the number of states of the underly- 
ing random variables, though there is always precisely one statistically 
significant local maxima. 

1 Introduction 

A common problem in statistical analysis is dealing with missing covariates 
in some of the replicates of multivariate data. A typical instance arises 
during longitudinal studies in the social and biological sciences, when par- 
ticipants may miss appointments or drop out of the study altogether. Over 
very long term studies nearly all replicates will involve some missing data, 
so it is usually impractical to throw out replicates with missing covariates. 
Furthermore, the underlying cause for the censoring (e.g. a subject dies) 
might play an important role in inference with the missing data that will 
lead to false conclusions in the complete case analysis. Thus, specialized 
techniques are needed in the setting where some of the data is missing. A 
useful reference for this material is [4j, from which we will draw notation 
and definitions. 

In this paper, we undertake an algebraic study of maximum likelihood 
estimation for general patterns of bivariate missing data, under the assump- 
tion that the data is missing at random (MAR) [4J. This implies, in partic- 
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ular, that the censoring mechanism does not affect the maximization of the 
Ukehhood function with respect to the underlying parameters of the model, 
and thus the nonresponse is ignorable. 

Let Yi, . . . ,Yn be i.i.d. replicates with d-covariates Xi, . . . , X^. Let M. 
be a parametric model for the joint distribution of the Xi. Let M be the 
d X n 0/1-matrix that is the indicator function for the missing entries of 
the Yj; that is Mij = 1 if and only if the ith covariate of Yj is missing. 
Under the assumption that the data are missing at random (the missing 
data mechanism does not depend on the values of the missing data), we 
can perform likelihood inference with respect to an underlying model for 
the way that the fully observed data was generated. In particular, if f{y\0) 
is the probability density function of random variables X, we have the log- 
likelihood function for the observed data: 

n 

mY,M)=^logf{Y,=y,\9,M), 
i=i 

where f{Yj = yj\6, M) denotes the marginal probability of observing Yj = yj 
with appropriate entries of yj censored 

fiYj = yj\0,M)= / /(^obs = 2/obs,^mis = a;mis|6')da;mis- 

JXi\Mij=l 

We wish to find the parameter values 9 that maximize this likelihood func- 
tion. 

Our focus in this paper is on the case when d = 2. With a general 
pattern of missing data in the bivariate case, we assume that our data comes 
in the following form. There are n complete cases where we obtain a 2- 
dimensional vector 1^. There are r cases where we only obtain variable Xi, 
and s cases where we only obtain variable X2. We denote these by Zi and 
Wi, respectively. The log- likelihood function becomes 

n r s 

my,w,z) = ^log/(y, = yj\e)+Y,logf{Zj = Zj\e)+Y,log /{Wj = wj\e) 
j=i i=i i=i 

and our goal is to maximize this function. 

One approach to determining the maximum likelihood estimate uses 
computational algebraic geometry. The connections between maximum like- 
lihood estimation and algebraic geometry was first extensively studied in [2] . 
A basic fact is that, if the critical equations (score equations) are rational 
functions of the parameters and the data, then the number of complex solu- 
tions to the critical equations is constant for generic (i.e. almost all) data. 
This fixed number is called the maximum likelihood degree (ML-degree for 
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short) of the model. The ML-degree is an intrinsic complexity measure of 
the score equations. In this paper, we compute the ML-degree in the bivari- 
ate missing data problem for Gaussian random variables and for multinomial 
random variables. 

The outline of this paper is as follows. In Section [2] we focus on the case 
where {Xi,X2) have a jointly normal distribution. We show that the ML- 
degree in this case is nine. Our simulations show that if the data is indeed 
generated from bivariate normal distributions, and the censoring mechanism 
is MCAR or MAR, then there is a unique real solution to the score equations, 
which is a local maximum. On the other hand, we also present examples of 
data, where either the model or the missing data mechanism are misspecified, 
where there can be two statistically relevant local maxima. The possible 
existence of multiple maxima is important to take into account when using 
the EM-algorithm to find the maximum likelihood estimate. In Section [3] 
we focus on the discrete case, where {Xi,X2) have a jointly multinomial 
distribution. In this setting, we give a combinatorial formula for the ML- 
degree. 

2 Bivariate Normal Random Variables 



We assume that X = {Xi,X2) ~ AA(/i, S) where E[X] = fJ- = (^i,/U2) and 

Cl2 ^22 

for j = 1, . . . , r and Wj ~ MifJ'2, (^22) for j = 1, . . . , s. Ignoring constants 
the log-likelihood function is equal to 



is the covariance matrix. Then we have Zj ~ A/'(^i,(Tii) 



^(/^,r|y,z/;,z 
1 



1 1 " 

--nlog(detS) - {Y^(Y, - f,Y^-HY, - f,)) 



2Hog(an)-^E(^. 
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slog ((722 ) 
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/^2j 



It is more convenient to use the entries of F := S" 



-1 _ 711 712 
_ 712 722 

computations. With this substitution, we get the identities 



m our 
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detr' 



0"22 
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and (T12 
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In the computations below we will also use a bar over a quantity to denote 
its average. The log-likelihood function becomes 

^ (n + r + s) log(det T) - log 722 - log 711 

-'^[{Y^-2i^iY;+til)-fn+2{Y^-iY;f,2+%l^i)+l^if^2hi^^^ 

_rdetT(Z-^_2^^Z + ^,l)-l^{W-2^,,W + f,l) (1) 
2 722 2 711 

The critical equations for £{fi, T; y, z, w) are: 



-- = "[(^i - /^i)7ii + (^2 - /"2)7l2] + ^'^^^(-^ - /^i) 
d^i 722 

r/T^ ^ NT detr,— , 

-- = ^T- (J^2 - /^2)722 + (J^ - /Wl)7l2 + S [W - 1^2) 

C'/^2 711 

0711 2 det r 2 711 2 

2 2 7fi 

5-^ -*-/,, \ Til ^ /T72 



(n + r + 5)-^ - -{Yi- 2^21^2 + /^^) 

6*722 2 detr 2722 2 

-UW - 2f,2W + ^i) - '-%Z^ - 2^iZ + 
2 27I2 

/ , , N 712 



(n + r + s)-^ - 71(^1^2 - {YifX2 + I2W) + m/"2) 



5712 det r 

+r^(Z2-2fi{Z + ^il) + s^{W-2fi2W + fil) (2) 
722 711 

Theorem 2.1. T/ie ML-degree of the hivariate normal missing data problem 
is equal to nine, and at least one of the critical solutions to (0j is real. 
Moreover, at least one such real critical solution is a local maximum in the 
statistically relevant parameter space. 

Proof. The theorem follows from a general principle about the number of 
complex solutions to a system of polynomial equations with parametric co- 
efficients. Namely, if such a system has N < 00 complex solutions (counted 
with multiplicity) for a "random" choice of parameter values then other 
random choices of parameter values will also produce complex solu- 
tions. Here we sketch a proof of this statement. Suppose / is an ideal 
in C{pi, . . . ,pk)[xi, . . . ,xt], the ring of polynomials in the indeterminates 
xi, . . . ,Xn with coefficients from the field of rational functions in pi, . . . 
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over C. Pick any term order and compute a Grobner basis G of I with re- 
spect to this term order. Now let U be the Zariski open set in C'^ such that 
no denominator of the coefficients and no initial coefficient of the polyno- 
mials encountered during the Buchberger algorithm that produces G vanish 
on any point in U. If p (z U then both the initial ideal of / and that of I{p) 
will have the same set of standard monomials: these are the monomials that 
no initial term in G and G{p), respectively, divide. It is a well-known result 
that I{p) has N < oo complex solutions (counted with multiplicity) if and 
only if the number of such standard monomials is A^. This implies that for 
all g G [/ the ideal I{q) will have complex solutions. 

Now in the setting of the critical equations ([2]) let J be the ideal gener- 
ated by the five polynomials obtained by clearing the denominators in ([2]). 
Furthermore, let K be the ideal generated by the product of these cleared 
denominators. Then the ML-degree we are after is the number of complex 
solution oi I = J : K. A random choice of n, r, s and data vectors yi, . . . , y„, 
zi , . . . , , and wi, . . . ^Wg, and a quick computation in Singular shows that 
I{n,r, s,y,w, z) has nine complex solutions. Our discussion above implies 
that the ML-degree of the bivariate normal missing data problem is nine. 
Since complex solutions to real polynomial equations come in complex con- 
jugate pairs, at least one must be a real solution. 

We can also see directly that there must be at least one real local max- 
imum inside the statistically relevant parameter space x PD2 (where 
PD2 denotes the space of 2 x 2 positive definite matrices). To see this, note 
that if any parameter has a large absolute value the log-likelihood function 
tends to —00. Similarly, if the S parameters approach the boundary of the 
positive definite cone the log- likelihood function tends to —00. Thus, the 
log-likehood function must have a local maximum inside x PD2. □ 

How many of the nine complex solutions in Theorem 12 . 1 1 can be real? We 
know that at least one is, but is it possible that there are three, five, seven, or 
nine? For various choices of the data parameters, we have observed that all 
of these values are possible. A more surprising fact is that the number of real 
solutions seems to be indicative of how well-specified the MAR assumption 
is. Here is a summary of the observations that emerge from our computations 
for which we have use Mathematica, Maple, and Singular We describe 
the separate cases in more detail in the paragraphs following the list. 

1. When the data was generated from a Gaussian or uniform distribution 
and the missing data mechanism was MCAR (missing completely at 
random) or MAR, we consistently observe exactly one real critical 
point, which is necessarily a local maximum. 

2. When the data was generated from a Gaussian distribution and the 



5 



missing data mechanism was NMAR (not missing at random) , we con- 
sistently observed three real critical points, all of which are in the sta- 
tistically relevant region {E? x PD2), of which two were local maxima. 

3. When the joint distribution of Y and the marginal distributions of W 
and Z were unrelated to each other by a natural censoring mecha- 
nism, we observed seven real critical points, of which three were in the 
statistically relevant region, and two were statistically relevant local 
maxima. 

4. When the twelve sufficient statistics (n, r, s, li, . . .) were generated 
randomly (without regard to an underlying distribution) we observed 
nine real critical points. 

Of course, we could not test all possible scenarios for the above data 
types, and there will always be the possibility that data generated by one of 
the strategies will have a different number of real solutions than we observed. 

When the censoring mechanism was MCAR, we did the censoring in 
the obvious way, by first generating data from a randomly chosen Gaussian 
distribution, and then censoring cell entries with the fixed probability ^. For 
a more general MAR scenario, we generated data by taking a mixture of the 
MCAR scenario, with the censoring mechanism that covariate X2 is not 
observed whenever Xi < —1. Out of 1000 runs of the MAR scenario 985 
cases produced a single real solution which is also a statistically relevant 
maximum. In fact, both of the above scenarios consistently had one real 
solution. 

For the NMAR censoring mechanism, we generated data from a random, 

strongly negatively-correlated Gaussian distribution, and censored covariate 
Xi when Xi < —1. Out of 1000 sample runs under this scenario 765 gen- 
erated three real solutions, all statistically relevant, with two being local 
maxima. 

For a family of "wild" examples, we choose Y and Z to be generated from 
the same Gaussian distributions with mean (0, 0) but W to be generated 
from a uniform distribution on the interval [5,6]. We tested this scenario 
with 1000 sample runs as well, and we observed 831 of them having seven 
real solutions, three of them statistically significant, with two local maxima. 

For the case of randomly generated data without regard to an underlying 
distribution we also did 1000 sample runs where we observed 134 cases with 
nine real critical sollutions. 

In summary, our computations suggest that the number of real solutions 
of the critical equations can be a gauge of how well the MAR assumption fits 
the data. For missing data sets with three or more covariates where direct 
computation of all critical points will not be possible, if the EM-algorithm 
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produces more than one local maximum, this might suggest that one should 
pay more careful attention to whether or not the MAR assumption makes 
sense for the data. 

3 Bivariate Discrete Random Variables 

In this section, we focus on the case where Xi and X2 are discrete multi- 
nomial random variables. We suppose that Xi € {1,2,... ,m} and X2 € 
{1, 2, . . . , n}. We give a combinatorial formula the ML-degree which shows 
that it grows exponentially as a function of m and n. 

In the bivariate multinomial case, the data can be summarized by a 
table of counts T which records the complete cases, and two vectors R and 
S which record the observations of only Xi and only X2, respectively. In this 
multinomial case, we want to estimate the raw probabilities pij = P{X\ = 
i,X2 = ])■ The log-likelihood function becomes 

m n m n 

£{R, S,T;p) =J2^ ^ij logKi + logPi+ + ^ Sj logp+j. (3) 

i=l j=l 1=1 j=l 

We want to find p that maximizes i{R, S, T; p) subject to p > and = 1. 

Theorem 3.1. The ML-degree of the bivariate multinomial missing data 
problem is equal to the number of bounded regions in the arrangement of 
hyperplanes {pij = 0,pi+ = 0,p+j = 0|i E [m],j € [n]} inside the hyper- 
plane = 1. Every solution to the score equations for is real. For 
generic R, S, T there is exactly one nonnegative critical point, and it is a 
local maximum. 

Proof. Maximizing the product of linear forms has a standard formula for 
the ML-degree as the number of bounded regions in the arrangement defined 
by these linear forms [2]. Each bounded region contains precisely one criti- 
cal solution which is real. Furthermore, since all the coordinate probability 
functions are linear in the parameters, the objective function is convex so 
there is exactly one nonnegative criticial point that must be a local maxi- 
mum. □ 

From Theorem 13.11 we see that to calculate the ML-degree we need to 
count the number of bounded regions in a hyperplane arrangement. The 
remainder of this section is devoted to performing this count. First we 
provide some definitions which allow us to state Theorem 13. 2i Then we 
proceed with the proof in a number of steps. 
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For integers k and I, the Stirling numbers of the second kind are the 
numbers 

' i=0 ^ ^ 

The negative index poly-Bernoulli numbers are the numbers: 
Bil,k) =^{-iy-H\Sil,i)ii + lf. 

1=0 

Theorem 3.2. The ML-degree of the bivariate multinomial m x n missing 
data problem is 

rn n / \ / \ 

ML(m, n) = Y, Y^^_^^m+n-k-l r\r\B{rn-Kn-l). (4) 

fc=0 1=0 \ / \ / 

For small values of m, we can explicitly work out formulas for this ML- 
degree. In particular, one can show that ML(2, n) = 2"+^ - 3. Since the 
ML-degree is monotone as a function of m and n, this shows that the ML- 
degree in the bivariate discrete case is exponential in the size of the problem. 
Let 

S = {p^j, I i € H U {+}, i G [n] U {+}} \ {p++} 

be the set of all hypcrplancs in the hyperplane arrangement that determines 
the ML-degree. Specifying a (possibly empty) region of the arrangement 
amounts to choosing a partition S = N U P. The resulting open region on 
the hyperplane = 1 consists of all matrices p such that pij < if pij G A'^ 
and Pij > if pij G P and Ylii jPij — ^- We denote this set of matrices by 
M. {N, P) . Our goal is characterize and count the partitions N Li P such 
that M{N,P) is nonempty and bounded. We prove a sequence of results 
classifying the type of subconfigurations that can appear in N and P. 

Lemma 3.3. Leti,k G [m] with i ^ k and I G [n] with j ^ I. Suppose that 
PijiPkl £ ^ o,i^dpii,pkj G P. Then if M.{N,P) is nonempty it is unbounded. 

Proof. Let eij denote the mxn matrix with a one in the ij position and zeros 
elsewhere. Suppose that p G M{N, P). Then p + a{eii + ekj — eij — eki) G 
M{N, P) for all a > since adding a(ej; -|- e^j — e^j — e^i) does not change 
the sign of any entry of p nor does it change any of the margins pi+ of 
p^j . Thus A4 {N, P) contains matrices with arbitrarily large entries and is 
unbounded. □ 

Let N' = N n {pij \ i e[m],j e [n]} and P' = P D {pij \ i G [m],j G [n]}. 
A partition A = (Ai,...,Am) is a nonincreasing sequence of nonnegative 
integers. The length of A is m (we allow zeros in the partition). 



8 



Lemma 3.4. Suppose that A4{N, P) is nonempty and bounded. There exists 
a permutation a of the rows and columns of p and a partition A such that 

a{N') = {pij\j <\}. 

The same is true for P' and for every rectangular suhmatrix of p. 

Proof. After permuting rows we may assume that the number of elements 
in row i, Aj, is a nonincreasing sequence. Permuting the columns we may 
suppose that the only elements of A^' in the first row of p are pn, . . . ,Piai ■ 
Permuting columns further, we may assume that the elements in the second 
row are of the form p2i, . . . ,P2X2 with A2 < Ai. There could not be any 
element of the form p2j € A^' with j > Ai because otherwise there would be 
more entries in row two than row one or A^' would contain pi\-^ , p2j and P' 
would contain pij,P2Xi which violates Lemma 13.31 Repeating the argument 
for each row shows that A4{N, P) can be put into partition form. □ 

Lemma 3.5. Suppose that A4{N, P) is nonempty and bounded. Thenpi^,p^j G 
P for all i and j. 

Proof. Suppose that M.{N,P) is nonempty and N contains, say, p+i. We 
will show Ai{N,P) is unbounded. To do this, it suffices to show that 
there exist points on the boundary of Ai{N,P) with coordinates of arbi- 
trarily large absolute values. Furthermore, we will assume that A4{N,P) 
is bounded (so that we can make liberal use of Lemmas 13.41 and 13. 3p and 
derive a contradiction. The boundary of A4{N,P) is described by allowing 
the strict inequalities to become weak inequalties. There are four cases to 
consider. 

Case 1. Suppose that there is no i such that pi^ G N. After permuting 
columns and rows we may suppose that p+j G if and only if j G [k]. If 
A4{N, P) is to be nonempty, we must have k < m. 

After permuting row and columns in such a way that the set of the first 
k columns is mapped to itself, we may suppose that the set of variables in 
A^ belonging to the submatrix p[l,m;l,fc] is in partition form, according 
to Lemma 13.41 If A4{N,P) is to be nonempty, it must be the case that 
Pij G A^ for all j G [k] since the first row is the longest row of the tableau. 
As pi+ G P, there must exist pn G P with / > k. Then consider the matrix 
p' with p'^i = —a, pij = a + 1 and pij = for all other i,j. This matrix 
satisfies all requirements to belong to the boundary of A4{N, P). Letting a 
tend to infinity shows that Ai{N,P) is unbounded, a contradiction. 

For the remaining three cases, we assume that there exists some i and j 
such that pi^,p^j G A^. After permuting rows and columns we may suppose 
there is A; < m and / < n such that pj+ G A^ if and only if i G [k] and 
p+j G A^ if and only if j G [/] . 
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Case 2. Suppose that there is a pij G N with i G [A;] and j G [Z] and a 
Pi'ji G P with i' G [fc + l,m] and / G [Z + l,n]. Then the matrix p' with 
Pij = —a, pi'j' = a + 1 and all other entries equals satisfies the requirements 
to belong to the boundary of Ai{N,P). Letting a tend to infinity shows 
that Ai{N,P) is unbounded, a contradiction. 

Case 3. Suppose that pij G P for all i G [k] and j G [/]. Since M{N, P) 
is nonempty, and pi^ G N for all i G [fc], we can find, for each i G [fc], a 
j € [I + l,n] such that G A^. As A^(Af, P) is bounded, this implies that 
we can permute rows and columns of the matrix p, so that p[l,k;l + l,n] 
is mapped into itself and so that this submatrix, intersected with N is of 
tableau form. With these assumptions, we must have pu+i G N for all 
i G [k]. Since G P, there must exist pi'i+i G P with i' € [k + l,m\. 

Now consider the matrix p' with = —a, p^/;^^ = a + 1 and all other 

entries equal to zero. This matrix satisfies all requirements for belonging to 
the boundary of M.{N, P) but as a tends to infinity shows that M{N, P) is 
unbounded. 

Case 4- Suppose that pij G N for all z G [k + l,m] and j e [I + 
l,n]. This is equivalent to saying that for all pij G P, pi+ and p+j are not 
simultaneously in P. If we permute rows and columns of p so that P is in 
tableau form, this condition is equivalent to saying that there is a pi/j' G P 
such that ^ P and none of pi+ nor p^j are in P for i < i' and 

J < j'- (Note that one of i' or j' might be zero, which will work fine in the 
following argument.) Then for any matrix p G M{N, P) we have 

i' f 

i=i j=i 

i' j' m j' i' n 

i=l j=l i=i'+l j=l i=l j=j'+l 

The expression at the end of this equation involves the sum, with positive 
coefficients, of all pij G P. Since the pij in the sum with pij G N all occur 
with coefficient 1, and since p++ = 1, we deduce that this sum must be 
strictly greater than 1. Thus M-{N, P) must be empty. □ 

Lemma 3.6. Let X be a partition of length m such that Xi < n — 1 for all 
i, and Xm = 0. Let N{X) = {pij\j < AJ and P(A) = S\ N{X). Then 
Ai{N{X), P{X)) is nonempty and bounded. 

Proof. To show that A^(A^(A), P(A)) is nonempty amounts to showing that 
there is a table p with nonzero entries that satisfies all the constraints pij < 
if Pij G N{X), Pij > if Pij G P(A) and p++ = 1. To this end, let e > be a 
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small real number. Define the matrix p(e) by the following rules: 

— e 



p{e) 



ifpi,- GiV(A) 

e if pij G -P(A) and i < m, j < n 

me if i = m,j < n 

ne if i < m, j = n 

1 — (3mn — 2m — 2n + 1 — 2 Afc)e if i = m, j = n 

By construction, p(e) E 7W(A^, P). 

Now we show that A4{N{X), P{X)) is bounded. For each A; E [m — 1] 
with Afe > we have 

k Afe 

< ^Pi+ + ^p+j 
i=i j=i 

k Xk m Xk k n 

= + Yl 12p'^ + Y1 Y1 

i=i j=i i=k+i j=i i=i i=Afe+i 



which implies that 

k Afe 
i=l j=l 



k Afc 



Afc 



^ Y.Y.pi^+ Yl Hp'^ + Y. Y1 P'l 

i=l j=l i=k+l j=l i=l j=Xif+l 

m n 

< T.T.p^j = ^- 



i=l 3=1 

Since pij G A^(A) whenever i £ [k] and j G [A^], we deduce that 

k Afe 

i=i j=i 

and thus — 1 < Pij < 0. Since every pij G A^(A) belongs to such a sum for 
some k, we see that pij is bounded for all pij G -/V(A). This implies that pij 
is bounded for all pij G -P(A) as well, since, p++ = 1. Thus, A4{N{\), P{X)) 
is bounded. □ 

To finish the proof, we use a result from the Master's thesis of Chad 
Brewbaker jl], that counts a family of 0/1 matrices that are closely related 
to the set N, P that have M{N, P) bounded. 

Theorem 3.7. The number o/ 0/1 m x n matrices A such that no 2 x 



2 submatrix of A is either 
Bernoulli number B{m,n). 



1 
1 



or 



1 

1 



is the negative index poly- 
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The 0/1 matrices in the theorem are known as lonesum matrices because 
they are the 0/1 matrices that are uniquely specified by their row and column 
sums. 

Proof of Theorem VJ. 2\ According to Lemmas 13. 3| 13.51 and 13.61 we must count 
sets C {pij I i € [rnljj £ [n]} with certain properties. Interpreting N as 
a 0/1 matrix where M where Mij = 1 \ipij € A, we see that we must count 



the matrices M that do not have any 2x2 submatrices equal to 



that no row or column of M could be all ones (otherwise, we would have, 
for example, pij < for all j but > which implies that A4{N,P) is 
empty) . Because of the fact that each such set A can be rearranged into 
a partition, and after switching the zeros and ones, this is the same as the 
number of lonesum 0/1 mxn matrices which have all row and column sums 
positive. Thus, the number M{m, n) can be obtained from the negative 
index poly-Bernoulli numbers B{m,n) by inclusion-exclusion which yields 
the desired formula (jl]). □ 
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